nuclear technology
SafeInfer: Context Adaptive Decoding Time Safety Alignment for Large Language Models
Banerjee, Somnath, Tripathy, Soham, Layek, Sayan, Kumar, Shanu, Mukherjee, Animesh, Hazra, Rima
Safety-aligned language models often exhibit fragile and imbalanced safety mechanisms, increasing the likelihood of generating unsafe content. In addition, incorporating new knowledge through editing techniques to language models can further compromise safety. To address these issues, we propose SafeInfer, a context-adaptive, decoding-time safety alignment strategy for generating safe responses to user queries. SafeInfer comprises two phases: the safety amplification phase, which employs safe demonstration examples to adjust the model's hidden states and increase the likelihood of safer outputs, and the safety-guided decoding phase, which influences token selection based on safety-optimized distributions, ensuring the generated content complies with ethical guidelines. Further, we present HarmEval, a novel benchmark for extensive safety evaluations, designed to address potential misuse scenarios in accordance with the policies of leading AI tech giants.
How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries
Banerjee, Somnath, Layek, Sayan, Hazra, Rima, Mukherjee, Animesh
In this study, we tackle a growing concern around the safety and ethical use of large language models (LLMs). Despite their potential, these models can be tricked into producing harmful or unethical content through various sophisticated methods, including 'jailbreaking' techniques and targeted manipulation. Our work zeroes in on a specific issue: to what extent LLMs can be led astray by asking them to generate responses that are instruction-centric such as a pseudocode, a program or a software snippet as opposed to vanilla text. To investigate this question, we introduce TechHazardQA, a dataset containing complex queries which should be answered in both text and instruction-centric formats (e.g., pseudocodes), aimed at identifying triggers for unethical responses. We query a series of LLMs -- Llama-2-13b, Llama-2-7b, Mistral-V2 and Mistral 8X7B -- and ask them to generate both text and instruction-centric responses. For evaluation we report the harmfulness score metric as well as judgements from GPT-4 and humans. Overall, we observe that asking LLMs to produce instruction-centric responses enhances the unethical response generation by ~2-38% across the models. As an additional objective, we investigate the impact of model editing using the ROME technique, which further increases the propensity for generating undesirable content. In particular, asking edited LLMs to generate instruction-centric responses further increases the unethical response generation by ~3-16% across the different models.
We Can Prevent AI Disaster Like We Prevented Nuclear Catastrophe
On 16th July 1945 the world changed forever. The Manhattan Project's'Trinity' test, directed by Robert Oppenheimer, endowed humanity for the first time with the ability to wipe itself out: an atomic bomb had been successfully detonated 210 miles south of Los Alamos, New Mexico. On 6th August 1945 the bomb was dropped on Hiroshima and three days later, Nagasaki-- unleashing unprecedented destructive power. The end of World War II brought a fragile peace, overshadowed by this new, existential threat. While nuclear technology promised an era of abundant energy, it also launched us into a future where nuclear war could lead to the end of our civilization.
AI Desperately Needs Global Oversight
Every time you post a photo, respond on social media, make a website, or possibly even send an email, your data is scraped, stored, and used to train generative AI technology that can create text, audio, video, and images with just a few words. This has real consequences: OpenAI researchers studying the labor market impact of their language models estimated that approximately 80 percent of the US workforce could have at least 10 percent of their work tasks affected by the introduction of large language models (LLMs) like ChatGPT, while around 19 percent of workers may see at least half of their tasks impacted. In other words, the data you created may be putting you out of a job. When a company builds its technology on a public resource--the internet--it's sensible to say that that technology should be available and open to all. But critics have noted that GPT-4 lacked any clear information or specifications that would enable anyone outside the organization to replicate, test, or verify any aspect of the model.
How Artificial Intelligence Threatens World Peace - AI Summary
In this one, I want to discuss how there is potential for a new conflict not dissimilar to the Cold War with the development and proliferation of nuclear energy; but this time AI will take centre stage of the theatre. Medical -- Nuclear technology has been harnessed in various medical applications ranging from imaging to killing tumors and sanitizing surgical equipment. "Both military and commercial robots will…incorporate'artificial intelligence' (AI) that could make them capable of undertaking…missions of their own." Professor Lennox writes that many commanders in the American military are voicing concern about relinquishing control over AI systems tasked with identifying, seeking and eliminating human targets. Just like how rewarding nuclear energy has proved to be in other fields, artificial intelligence has its better and worse applications.
Sawtooth Supercomputer Coming to INL's Collaborative Computing Center
IDAHO FALLS, Idaho, Dec. 5, 2019 – A powerful new supercomputer arrived this week at Idaho National Laboratory's Collaborative Computing Center. The machine has the power to run complex modeling and simulation applications, which are essential to developing next-generation nuclear technologies. Named after a central Idaho mountain range, Sawtooth arrives in December and will be available to users early next year. That is the highest ranking reached by an INL supercomputer. Of 102 new systems added to the list in the past six months, only three were faster than Sawtooth.
AI for Peace - War on the Rocks
This article was submitted in response to the call for ideas issued by the co-chairs of the National Security Commission on Artificial Intelligence, Eric Schmidt and Robert Work. It addresses the fourth question (part a.) which asks what international norms for artificial intelligence should the United States lead in developing, and whether it is possible to create mechanisms for the development and enforcement of AI norms. In 1953, President Dwight Eisenhower asked the world to join him in building a framework for "Atoms for Peace." He made the case for a global agreement to prevent the spread of nuclear weapons while also sharing the peaceful uses of nuclear technology for power, agriculture, and medicine. No one would argue the program completely prevented the spread of weapons technology: India and Pakistan used technology gained through Atoms for Peace in their nascent nuclear weapons programs.
How Confucianism Could Put Fears About Artificial Intelligence to Bed OZY
When Arnold Schwarzenegger said "I'll be back" in The Terminator, he probably didn't realize the film would keep coming back in discussions about robots and artificial intelligence. Yet 35 years after Schwarzenegger portrayed a cyborg assassin from an AI-dominated future, much of Western discourse on robots is repeating a Terminator-like scenario: panic that robots will take our jobs, and that AI will take over the world, Skynet-style. Western culture has had a long history of individualism, warlike use of technology, Christian apocalyptic thinking and a strong binary between body and soul. These elements might explain the West's obsession with the technological apocalypse and its opposite: techno-utopianism. In Asia, it's now common to explain China's dramatic rise as a leader in AI and robotics as a consequence of state support from the world's largest economy.
What's Next for Artificial Intelligence
The traditional definition of artificial intelligence is the ability of machines to execute tasks and solve problems in ways normally attributed to humans. Some tasks that we consider simple--recognizing an object in a photo, driving a car--are incredibly complex for AI. Machines can surpass us when it comes to things like playing chess, but those machines are limited by the manual nature of their programming; a 30 gadget can beat us at a board game, but it can't do--or learn to do--anything else. This is where machine learning comes in. Show millions of cat photos to a machine, and it will hone its algorithms to improve at recognizing pictures of cats.
What's Next for Artificial Intelligence
The traditional definition of artificial intelligence is the ability of machines to execute tasks and solve problems in ways normally attributed to humans. Some tasks that we consider simple--recognizing an object in a photo, driving a car--are incredibly complex for AI. Machines can surpass us when it comes to things like playing chess, but those machines are limited by the manual nature of their programming; a 30 gadget can beat us at a board game, but it can't do--or learn to do--anything else. This is where machine learning comes in. Show millions of cat photos to a machine, and it will hone its algorithms to improve at recognizing pictures of cats.